feat: add consensus checkpoints for high-stakes tasks#18
Conversation
- Add comprehensive tests for SlackIntegration (37 tests) - Add CircuitBreaker and retry utility tests - Add Semaphore and AgentPool tests - Add review-loops route handler tests - Improve health monitoring tests - Improve request parsing tests - Improve auth route tests Closes #6 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a consensus checkpoint system that requires validation before high/medium risk tasks can spawn subtasks, preventing self-reinforcing agent cycles. Features: - ConsensusService with configurable risk levels and reviewer strategies - Database schema for checkpoints and audit events - REST API endpoints for checkpoint management - MCP tools for consensus operations - WebSocket events for real-time notifications - Integration with task_create to block high-risk subtasks Configuration options: - enabled: toggle consensus system - requireForRiskLevels: ['high', 'medium'] by default - reviewerStrategy: 'adversarial', 'different-model', or 'human' - timeout: checkpoint expiration (default 5 minutes) - maxDepth: maximum task depth before requiring consensus Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughA ConsensusService and consensus workflow were added: task creation now evaluates risk/depth and can create consensus checkpoints; DB schema, types, config, MCP wiring, HTTP routes, WebSocket events, and tests were extended to support consensus checkpoint lifecycle, review, approval/rejection, and auditing. Changes
Sequence DiagramsequenceDiagram
rect rgba(200,230,255,0.5)
actor Client
end
participant MCP as MCP Server
participant Tools as Task Tools
participant Consensus as ConsensusService
participant DB as SQLiteStore
participant API as HTTP API
Client->>MCP: Call task_create (agentType,input,parentTaskId?,riskLevel?)
MCP->>Tools: invoke task_create handler
Tools->>Consensus: estimateRiskLevel(agentType,input)
Consensus-->>Tools: riskLevel
Tools->>Consensus: requiresConsensus(riskLevel, depth, parentTaskId)
Consensus->>DB: query depth / pending checkpoints
Consensus-->>Tools: { requiresConsensus: true/false }
alt requiresConsensus == true
Tools->>Consensus: createCheckpoint(options with proposedSubtasks)
Consensus->>DB: insert consensus_checkpoint + events
Consensus-->>Tools: checkpointId / pending response
Tools-->>Client: respond with pending-consensus payload
else
Tools->>DB: createTask(..., options with riskLevel/depth)
DB-->>Tools: task created
Tools-->>Client: respond with created task
end
Client->>API: GET /api/v1/consensus/pending
API->>Consensus: listPendingCheckpoints()
Consensus->>DB: select pending checkpoints
DB-->>API: checkpoints
API-->>Client: paginated list
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@src/tasks/consensus-service.ts`:
- Around line 16-17: Remove the unused imports ReviewerStrategy and Task from
the import list in src/tasks/consensus-service.ts; locate the import statement
that currently includes ReviewerStrategy and Task and delete those two symbols
so only actually used exports remain, then run lint to confirm the unused-import
error is resolved.
In `@src/web/routes/consensus.ts`:
- Around line 35-58: The route currently uses
getService().listPendingCheckpoints({ limit, offset }) and passes
checkpoints.length as the pagination total, which is just the page size; change
the handler to obtain the real total (either by calling a new service method
like getService().countPendingCheckpoints() or by modifying
listPendingCheckpoints to return { items, total }), map the returned items
(e.g., items or checkpoints) to the existing payload shape, and pass the real
total into sendPaginated; update references to listPendingCheckpoints,
countPendingCheckpoints (or the new return shape) and sendPaginated accordingly.
In `@tests/unit/integrations/slack.test.ts`:
- Around line 1-23: Save the original global.fetch before replacing it with the
vi.fn() mock and restore it in an afterAll hook to prevent cross-test leakage:
capture the original (e.g., const originalFetch = global.fetch) before assigning
global.fetch = mockFetch, then add an afterAll() that sets global.fetch =
originalFetch and clears mockFetch; reference the existing mockFetch and
global.fetch variables and add the afterAll test hook to perform the restore.
In `@tests/unit/web/request.test.ts`:
- Around line 113-121: In createMockRequestWithBuffer inside the Readable read()
implementation, avoid using a concise arrow that implicitly returns from the
chunks.forEach callback; replace the concise arrow (chunks.forEach(chunk =>
this.push(chunk))) with a block-bodied callback (chunks.forEach(chunk => {
this.push(chunk); })) so the callback does not return a value and the Biome lint
rule is satisfied.
🧹 Nitpick comments (4)
tests/unit/web/routes/auth.test.ts (2)
270-278: Consider asserting the error message for consistency.Unlike other tests in this file, this test only checks
resStatusbut not theresponse.errormessage. Adding an assertion for the error message would make it consistent with the other validation tests.Suggested improvement
await routes.register(mockReq as IncomingMessage, mockRes as ServerResponse); expect(resStatus).toBe(400); + const response = JSON.parse(resBody); + expect(response.error).toBe('All fields are required'); });
854-868: Consider consolidating this test into the existinglogoutdescribe block.There's already a
describe('logout', ...)block starting at line 464 that contains similar tests, including one for logout errors (lines 493-505). Moving this test case into that block would improve organization and avoid the somewhat awkward "logout - additional cases" naming.Suggested consolidation
Move this test into the existing
logoutdescribe block (after line 505):it('should handle logout errors', async () => { vi.mocked(authService.logout).mockRejectedValue(new Error('Logout failed')); // ... existing test }); + + it('should handle unknown logout errors', async () => { + vi.mocked(authService.logout).mockRejectedValue('Unknown error'); + + mockReq = createMockRequest({ + refreshToken: 'refresh-token', + }); + + await routes.logout(mockReq as IncomingMessage, mockRes as ServerResponse); + + expect(resStatus).toBe(500); + const response = JSON.parse(resBody); + expect(response.error).toBe('Logout failed'); + }); });Then remove the separate
describe('logout - additional cases', ...)block.src/mcp/tools/task-tools.ts (1)
509-560: Consider using Zod schemas for input validation consistency.The existing task tools (e.g.,
task_create,task_assign) use Zod schemas for input validation, but the new consensus tools use raw type assertions (params.agentType as string). This creates inconsistent error handling—Zod provides structured validation errors, while type assertions may produce unclear runtime errors.♻️ Example: Add Zod schema for consensus_check
// Add near other schemas at the top of the file const ConsensusCheckInputSchema = z.object({ agentType: z.string().min(1).describe('Agent type for this task'), input: z.string().optional().describe('Task input/description'), parentTaskId: z.string().uuid().optional().describe('Parent task ID'), riskLevel: z.enum(['low', 'medium', 'high']).optional().describe('Risk level override'), }); // Then in the handler: handler: async (params: Record<string, unknown>) => { if (!consensusService) { return { success: false, error: 'Consensus service not available' }; } try { const input = ConsensusCheckInputSchema.parse(params); // ... rest of handler using input.agentType, input.input, etc. } }src/tasks/consensus-service.ts (1)
455-460: Consider making the expiration check interval configurable.The hardcoded 60-second interval works but could be made configurable for different deployment scenarios (e.g., shorter intervals for time-sensitive environments).
| ReviewerStrategy, | ||
| Task, |
There was a problem hiding this comment.
Remove unused imports to fix lint errors.
The pipeline is failing due to ReviewerStrategy and Task being imported but never used in this file.
🐛 Proposed fix
import type {
AgentStackConfig,
ConsensusConfig,
ConsensusCheckpoint,
ConsensusDecision,
ConsensusStatus,
ProposedSubtask,
TaskRiskLevel,
- ReviewerStrategy,
- Task,
} from '../types.js';📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ReviewerStrategy, | |
| Task, | |
| import type { | |
| AgentStackConfig, | |
| ConsensusConfig, | |
| ConsensusCheckpoint, | |
| ConsensusDecision, | |
| ConsensusStatus, | |
| ProposedSubtask, | |
| TaskRiskLevel, | |
| } from '../types.js'; |
🧰 Tools
🪛 GitHub Actions: CI
[error] 16-16: ReviewerStrategy is defined but never used. (@typescript-eslint/no-unused-vars)
🪛 GitHub Check: Lint
[failure] 17-17:
'Task' is defined but never used
[failure] 16-16:
'ReviewerStrategy' is defined but never used
🤖 Prompt for AI Agents
In `@src/tasks/consensus-service.ts` around lines 16 - 17, Remove the unused
imports ReviewerStrategy and Task from the import list in
src/tasks/consensus-service.ts; locate the import statement that currently
includes ReviewerStrategy and Task and delete those two symbols so only actually
used exports remain, then run lint to confirm the unused-import error is
resolved.
| import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; | ||
| import { | ||
| SlackIntegration, | ||
| getSlackIntegration, | ||
| resetSlackIntegration, | ||
| type SlackConfig, | ||
| type SlackMessage, | ||
| } from '../../../src/integrations/slack.js'; | ||
|
|
||
| // Mock the logger | ||
| vi.mock('../../../src/utils/logger.js', () => ({ | ||
| logger: { | ||
| child: () => ({ | ||
| debug: vi.fn(), | ||
| error: vi.fn(), | ||
| }), | ||
| }, | ||
| })); | ||
|
|
||
| // Mock global fetch | ||
| const mockFetch = vi.fn(); | ||
| global.fetch = mockFetch; | ||
|
|
There was a problem hiding this comment.
Restore global.fetch after this suite to avoid cross-test leakage.
Right now global.fetch stays mocked for subsequent suites. Capture the original and restore it in an afterAll.
✅ Suggested fix
-import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
+import { describe, it, expect, vi, beforeEach, afterEach, afterAll } from 'vitest';
@@
-// Mock global fetch
-const mockFetch = vi.fn();
-global.fetch = mockFetch;
+// Mock global fetch
+const originalFetch = global.fetch;
+const mockFetch = vi.fn();
+global.fetch = mockFetch;
@@
afterEach(() => {
vi.clearAllMocks();
resetSlackIntegration();
});
+
+ afterAll(() => {
+ global.fetch = originalFetch;
+ });🤖 Prompt for AI Agents
In `@tests/unit/integrations/slack.test.ts` around lines 1 - 23, Save the original
global.fetch before replacing it with the vi.fn() mock and restore it in an
afterAll hook to prevent cross-test leakage: capture the original (e.g., const
originalFetch = global.fetch) before assigning global.fetch = mockFetch, then
add an afterAll() that sets global.fetch = originalFetch and clears mockFetch;
reference the existing mockFetch and global.fetch variables and add the afterAll
test hook to perform the restore.
| describe('parseRequestBody - additional edge cases', () => { | ||
| function createMockRequestWithBuffer(chunks: Buffer[]): IncomingMessage { | ||
| const stream = new Readable({ | ||
| read() { | ||
| chunks.forEach(chunk => this.push(chunk)); | ||
| this.push(null); | ||
| } | ||
| }); | ||
| return stream as unknown as IncomingMessage; |
There was a problem hiding this comment.
Fix Biome lint: avoid returning from forEach callback (Line 117).
The concise arrow returns this.push(...), which trips the lint rule. Wrap the body in braces to avoid returning a value.
🔧 Proposed fix
- chunks.forEach(chunk => this.push(chunk));
+ chunks.forEach(chunk => {
+ this.push(chunk);
+ });📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| describe('parseRequestBody - additional edge cases', () => { | |
| function createMockRequestWithBuffer(chunks: Buffer[]): IncomingMessage { | |
| const stream = new Readable({ | |
| read() { | |
| chunks.forEach(chunk => this.push(chunk)); | |
| this.push(null); | |
| } | |
| }); | |
| return stream as unknown as IncomingMessage; | |
| describe('parseRequestBody - additional edge cases', () => { | |
| function createMockRequestWithBuffer(chunks: Buffer[]): IncomingMessage { | |
| const stream = new Readable({ | |
| read() { | |
| chunks.forEach(chunk => { | |
| this.push(chunk); | |
| }); | |
| this.push(null); | |
| } | |
| }); | |
| return stream as unknown as IncomingMessage; |
🧰 Tools
🪛 Biome (2.3.13)
[error] 117-117: This callback passed to forEach() iterable method should not return a value.
Either remove this return or remove the returned value.
(lint/suspicious/useIterableCallbackReturn)
🤖 Prompt for AI Agents
In `@tests/unit/web/request.test.ts` around lines 113 - 121, In
createMockRequestWithBuffer inside the Readable read() implementation, avoid
using a concise arrow that implicitly returns from the chunks.forEach callback;
replace the concise arrow (chunks.forEach(chunk => this.push(chunk))) with a
block-bodied callback (chunks.forEach(chunk => { this.push(chunk); })) so the
callback does not return a value and the Biome lint rule is satisfied.
- Fix pagination total count: add countPendingCheckpoints() to return actual total instead of page size - Make risk estimation configurable: add highRiskAgentTypes, mediumRiskAgentTypes, highRiskPatterns, mediumRiskPatterns to config - Use unref() on expiration interval to prevent keeping process alive - Add warning log when creating checkpoint with empty proposedSubtasks - Add tests for new functionality (7 new test cases) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/tasks/consensus-service.ts`:
- Around line 62-71: The issue is that userConfig is inferred as {} so accessing
optional consensus properties causes a TypeScript error; explicitly type
userConfig as a partial/optional consensus config (for example use
Partial<ConsensusConfig> or the appropriate interface used for
appConfig.consensus) or narrow it with a type guard before merging, then merge
into this.config using DEFAULT_CONFIG and the typed userConfig so properties
like highRiskAgentTypes, mediumRiskAgentTypes, highRiskPatterns, and
mediumRiskPatterns are recognized by the compiler; update the declaration of
userConfig (the appConfig.consensus ?? {}) and ensure the merge into this.config
still preserves defaults via DEFAULT_CONFIG.
| const userConfig = appConfig.consensus ?? {}; | ||
| this.config = { | ||
| ...DEFAULT_CONFIG, | ||
| ...userConfig, | ||
| // Preserve defaults for optional array fields when not explicitly set | ||
| highRiskAgentTypes: userConfig.highRiskAgentTypes ?? DEFAULT_CONFIG.highRiskAgentTypes, | ||
| mediumRiskAgentTypes: userConfig.mediumRiskAgentTypes ?? DEFAULT_CONFIG.mediumRiskAgentTypes, | ||
| highRiskPatterns: userConfig.highRiskPatterns ?? DEFAULT_CONFIG.highRiskPatterns, | ||
| mediumRiskPatterns: userConfig.mediumRiskPatterns ?? DEFAULT_CONFIG.mediumRiskPatterns, | ||
| }; |
There was a problem hiding this comment.
Fix TypeScript build error: userConfig type inference issue.
The pipeline is failing because when appConfig.consensus is undefined, userConfig has type {}, and TypeScript doesn't recognize the optional properties being accessed on lines 67-70.
🐛 Proposed fix
- const userConfig = appConfig.consensus ?? {};
+ const userConfig = appConfig.consensus;
this.config = {
...DEFAULT_CONFIG,
...userConfig,
// Preserve defaults for optional array fields when not explicitly set
- highRiskAgentTypes: userConfig.highRiskAgentTypes ?? DEFAULT_CONFIG.highRiskAgentTypes,
- mediumRiskAgentTypes: userConfig.mediumRiskAgentTypes ?? DEFAULT_CONFIG.mediumRiskAgentTypes,
- highRiskPatterns: userConfig.highRiskPatterns ?? DEFAULT_CONFIG.highRiskPatterns,
- mediumRiskPatterns: userConfig.mediumRiskPatterns ?? DEFAULT_CONFIG.mediumRiskPatterns,
+ highRiskAgentTypes: userConfig?.highRiskAgentTypes ?? DEFAULT_CONFIG.highRiskAgentTypes,
+ mediumRiskAgentTypes: userConfig?.mediumRiskAgentTypes ?? DEFAULT_CONFIG.mediumRiskAgentTypes,
+ highRiskPatterns: userConfig?.highRiskPatterns ?? DEFAULT_CONFIG.highRiskPatterns,
+ mediumRiskPatterns: userConfig?.mediumRiskPatterns ?? DEFAULT_CONFIG.mediumRiskPatterns,
};🧰 Tools
🪛 GitHub Check: Build
[failure] 70-70:
Property 'mediumRiskPatterns' does not exist on type '{}'.
[failure] 69-69:
Property 'highRiskPatterns' does not exist on type '{}'.
[failure] 68-68:
Property 'mediumRiskAgentTypes' does not exist on type '{}'.
[failure] 67-67:
Property 'highRiskAgentTypes' does not exist on type '{}'.
🪛 GitHub Check: Type Check
[failure] 70-70:
Property 'mediumRiskPatterns' does not exist on type '{}'.
[failure] 69-69:
Property 'highRiskPatterns' does not exist on type '{}'.
[failure] 68-68:
Property 'mediumRiskAgentTypes' does not exist on type '{}'.
[failure] 67-67:
Property 'highRiskAgentTypes' does not exist on type '{}'.
🤖 Prompt for AI Agents
In `@src/tasks/consensus-service.ts` around lines 62 - 71, The issue is that
userConfig is inferred as {} so accessing optional consensus properties causes a
TypeScript error; explicitly type userConfig as a partial/optional consensus
config (for example use Partial<ConsensusConfig> or the appropriate interface
used for appConfig.consensus) or narrow it with a type guard before merging,
then merge into this.config using DEFAULT_CONFIG and the typed userConfig so
properties like highRiskAgentTypes, mediumRiskAgentTypes, highRiskPatterns, and
mediumRiskPatterns are recognized by the compiler; update the declaration of
userConfig (the appConfig.consensus ?? {}) and ensure the merge into this.config
still preserves defaults via DEFAULT_CONFIG.
Summary
Implements Issue #5 - Consensus Checkpoints for High-Stakes Tasks.
This PR adds a consensus checkpoint system that requires validation before high/medium risk tasks can spawn subtasks, preventing self-reinforcing agent cycles.
Key Features
consensus_checkpointsandconsensus_checkpoint_events/api/v1/consensus/*consensus_check,consensus_approve,consensus_reject,consensus_list_pending,consensus_gettask_createConfiguration
{ "consensus": { "enabled": true, "requireForRiskLevels": ["high", "medium"], "reviewerStrategy": "adversarial", "timeout": 300000, "maxDepth": 5, "autoReject": false } }Files Changed
src/types.ts- Type definitions for consensus systemsrc/utils/config.ts- Zod schema for consensus configsrc/memory/sqlite-store.ts- Database schema and CRUD operationssrc/tasks/consensus-service.ts- Core consensus logic (new)src/web/routes/consensus.ts- REST API endpoints (new)src/mcp/tools/task-tools.ts- MCP tool integrationsrc/web/websocket/event-bridge.ts- WebSocket eventsTest Coverage
Test plan
consensus.enabled: true🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.